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510.1173 eet 
Sus 3c I. INTRODUCTION 


International comparisons of student achievement are important tools for 
national educational policy making. The results of comparisons of educational 
achievement with other countries are always highly visible and bring public 
attention to the conditions of our education system. Studies placed under such strong 
national spotlights must be prepared for critical review in all aspects -- sample 


design and execution as well as conceptual development. 


The first rule of sample surveys is that the surveys be carried out under 
conditions that permit the laws of statistical probability to be calculated. The 
credibility of sample surveys, as scientific samples of the U.S. student population, is 
determined by how close the surveys followed standard statistical sampling 
procedures, not by the size of the final sample or by other criteria. Sample surveys 
with low response rates are quickly negated by the statistical community, no matter 
how much effort has been spent otherwise on the study. For this reason, critiques of 
the Second International Mathematics Study have focused on the response rates 


obtained in the U.S. sample. 


Achieving unimpeccable survey response for ‘surveys of student achievement 
is demanding because such surveys do not have the power of a legal mandate for use 
of classroom time. Participation in sample surveys is voluntary for schools and 
school districts in the United States and surveys may require considerable time and 
effort by teachers and students. In fact, high response rates have been the exception 
rather than the rule for many school-conducted studies, as will be shown again in 


this paper. 


The specific purpose of this paper is to review the sampling procedures 
followed in the United States portion of the Second International Mathematics Study 
(SIMS) and to use all available evidence from comparable surveys to examine the 
accuracy of the estimates of student achievement found in that study. First, the 
sampling methods and rates of SIMS will be compared with previous international 
surveys of student achievement carried out in the United States. It will be shown that 
other surveys conducted by the International Association for the Evaluation of 
Educational Achievement (IEA) have obtained low participation rates from both 
schools and students compared with surveys conducted by other organizations. 


Secondly, the findings of the SIMS will be compared with survey results from other 
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national surveys. This section of the report will show that even though the response 
rates of SIMS were low, that the survey estimates of student achievement levels in 
mathematics and the international ranking of the United States compared with other 


countries obtained in the SIMS are accurate. 


The Second International Mathematics Study was conducted on two populations 
in the United States: Population A was classes of mathematics in 8th grade and 
Population B was classes of students in the most advanced secondary school 
mathematics classes. More information was available for the 8th grade sample than 
for the 12th grade permitting a more detailed examination of the 8th grade sample 
than of the 12th grade. 


There have been two previous papers concerning the sampling procedures for 
SIMS. The first was prepared by the international study director for SIMS, Robert 
Garden of the New Zealand Department of Education (Gordon, 1987). Also a review of 
the sampling procedures used in the United States Second International Mathematics 
Study was produced in 1985 by Darrell Bock and Bruce Spencer for the National 
Academy of Sciences (Bock & Spencer, 1985). 


II. REVIEW OF SURVEY COOPERATION RATES 


This section will review the published information available about the survey 
response rates for five international studies: The IEA First International 
Mathematics Study (FIMS), the IEA Second International Mathematics Study (SIMS), 
the IEA First International Science Study (FISS), the IEA Second International 
Science Study (SISS), and the International Assessment of Educational Progress 


(I[AEP) conducted by the Education Testing Service. 


IEA First International Mathematics Study (FIMS) 


The first IEA assessment of mathematics was conducted by the IEA in twelve 


countries in 1963. The U.S. study was carried out at the University of Chicago. 


Apparently no detailed information for the sample sizes, response rates and 
sampling procedures has survived for FIMS. We learned from the international 
report (Husén, 1967) that the sampling for the study was conducted in 3 stages 


(chapter 9, page 150) and that the sample sizes and design effects were as they are 
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shown in Table 1. The design effects are approximations based on the formula: 
design effect = 1 + (b - 1) roh based on the information provided in the published 
document. The average cluster size b for the five target populations was estimated by 
taking the ratio of the number of students to the number of schools in the achieved 


sample. 


The sample sizes were selected with the purpose of making reliable national 
estimates for the cognitive achievement levels, the major intent of the study. These 
samples were not large enough to allow detailed classification analysis of small 
population groups. Later international studies carried out in the United States 


followed a similar design. 


IEA Second International Mathematics Study (SIMS) 


The study of primary interest in this paper is SIMS. This section will review 
the sampling procedures of SIMS and examine the estimates of student and teacher 
characteristics by comparing them with surveys carried out in the United States 


during the same period and under similar conditions. 


SIMS was conducted during the 1981-82 school year. The International 
Coordinator of the study was Robert Garden at the New Zealand Department of 
Education in Wellington. The U.S. Coordinator was Kenneth Travers, College of 
Education, University of Illinois at Urbana-Champaign. The International Sampling 
Referee was Malcolm Rosier at the Australian Council for Educational Research, 
Melboume. In the U.S. SIMS was administered at the beginning and at the end of the 


school year using rotated forms in a matrix-sampling design. 


The sampling and data collection activities were carried out by the Survey 
Research Laboratory of the University of Illinois during the 1981-82 school year. 
The sample for SIMS was a complex three stage probability sample of public and 
private schools for each of two target populations: schools containing grade 8 and 


schools containing grade 12 students. 


The primary sampling unit was school districts, the secondary sampling unit 
was schools within sampled districts, and the tertiary sampling unit was _ intact 


classrooms within sampled schools. School districts were stratified into 7 Strata. Six 
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Descriptive Information for. International nt_ A men urv nite tates: 


Approx 
Achieved Cooperation Approx” effective 


Target sample rate design sample 
population Sch Student Sch Student Roh effect size 


2. SIMS, 19822 
Grade 8 
Grade 12 
3. FISS, 19703 


Age 10 
Age 14 


4. SISS, 19864 


Grade 5 
Grade 9 
Adv Bio. 
Adv Chem. 
Adv Phys. 


5. IAEP, 19882 


Math 200 905 

Science 200 859 

Source: eg G.F. Peaker, Chapter 9, "Sampling" in T. Husén, International Study of 
Achievement Mathematics, Vol II, table 9.7, pg. 161...(2) Robert Garden, The Second 
International Ghee Study: Sampling Report, NCES, 1987. The school district response 
rate for the SIMS was 50% for grade 8 and 48% for grade 12 (93 districts). (3) Richard M. Wolf, 
Achievement _in America, pg. 31. (4) Alan P. Duffer and Frank J. Potter, Final Report: Second 
International ience udy, Prepared for Teachers College, Columbia University. Research 
Triangle Institute, Research Triangle Park, North Carolina, November, 1986. Science 
Achievement _in venteen Countries: A Preliminary Report, 1988. (5) Benjamin F. King, A 
World of Differences, An International Assessment of Mathematics and Science Technical 
Report, Part I; Sampling, Quebec, Canada, May 1989: Centre de Recherche et de Developpement en 
Mesure et Evaluation, Université Laval. Rho was estimated algebraically for the first 
mathematics and science studies from information provided on the design effect. 


= 
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strata were formed for public school districts by crossing two levels of region (east- 
central and south west) with three levels of metropolitan status (city, suburb, and 
other). The seventh stratum was a separate sample of private schools. Schools within 
PSUs were stratified into six strata. These were public versus private, and 8th grade 
only, 12th grade only and 8th and 12th grade. School districts and schools were 
selected with probability proportional to a measure of size, but classrooms were 
selected with equal probability. Two schools were selected per sampled school district 


and two classrooms were selected per sampled school. 


The school district, school, student cooperation rates (after replacement), and 
roh for the posttest are contained in Table 1 (based on results reported in Robert 
Garden, 1987). The design effects are approximations which were calculated with the 
formula described above (however, they differ from the design effects reported in 
the SIMS Sampling Report). The average cluster size b was calculated by the ratio of 
the number of students to the number of schools in the achieved sample divided by 


two (due to the fact that two classes were sampled per school). 


One way to judge the adequacy of a U.S. sample is to estimate the effective 
sample size for each target population and the size of a simple random sample that 
has the same level of sampling accuracy as the complex sample. The International 
Sampling Manual for the IEA Reading Literacy Study suggests that the effective 
sample size should equal at least 400 for each target population. The effective sample 
sizes in Table 1 were estimated by dividing the achieved sample size by the design 


effects for each target population. 


Another way to judge the adequacy of the U.S. SISS sample is to use a minimum 
Criterion for the cooperation rate. The JEA_ International Sampling Manual for the 


IEA Reading Literacy Study requires national samples to obtain a cooperation rate of 
at least 85% for both schools and students. 


The effective sample sizes for the SIMS is estimated as 506 and 570 for grades 8 
and 12, respectively. Using the criteria of the IEA Reading Literacy Study the sample 
sizes in the SIMS were more than adequate. However, the U.S. SIMS sample may not 
have been representative of the target population. This is because the district, 


school, and student cooperation rates were below the current IEA standard of 85%. 
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IEA First International Science Study (FISS) 


The first IEA study of science was conducted in 17 countries in 1970 as part of 
the IEA Six Subject Survey of science, reading, literature, English as a foreign 
language, French as a foreign language, and civic education. The participating 
countries were Australia, Belgium, Chile, England, Germany, Finland, France, 
Hungary, India, Iran, Italy, the Netherlands, New Zealand, Scotland, Sweden, 
Thailand, and the United States. The U.S. study results were published in 1978 in Wolf, 
(19xx). 


The sample for the study was a multi-stage sample of school districts (a listing 
of schools was not available). Communities were grouped into three groups of town 
size, three categories of socioeconomic status, and public and private schools. Within 


school districts, students were chosen from a list provided by the school. 


Second International Science Study (SISS) 


In 1983 SISS was conducted in 24 countries which included 10 of those which 
participated in FISS. . The International Coordinator and Sampling Referee was 
Malcolm Rosier at the Australian Council for Educational Research. In the U.S. the 
National Coordinator was Willard Jacobson at Teachers College, Columbia University 
in New York. 


The U.S. first collected data in the SISS in 1983. However, due to an 
unacceptably low response rate, data were collected again in 1986. Only the results of 


the second round of data collection are discussed in this paper. 


The sampling and data collection activities were carried out in the United 
States by Research Triangle Institute (RTI) during the time interval between May, 
1985 and October, 1986. The sample for the Second International Science Study was a 
complex three-stage probability sample of public and private schools for each of 
three target populations: schools containing grade 5, grade 9, and schools with 
advanced biology, chemistry or physics classes. The primary sampling unit was 
counties, or groups of counties, the secondary sampling unit was schools within 
sampled PSUs, and the tertiary sampling unit was intact classrooms within sampled 


SSUs. Within the 70 selected PSUs schools were placed within sixteen strata (public 
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versus non-public, metropolitan versus non-metropolitan, and four geographic 


regions: northeast, south, north-central and _ western). 


Counties and schools were selected with probability proportional to a measure 
of size, but classrooms were selected with equal probability. Two schools were 
selected per PSU for each target population, and within each school one intact 
Classroom was selected. Separate school sampling frames were used in each target 
population. For Population 3, RTI used a screening procedure to create a biology 
school frame, a physics school frame, and a chemistry school frame. Selection 
probabilities and sampling weights were adjusted to compensate for the initial 
screening process. Nonparticipating PSUs and schools were replaced with randomly 
selected units within each stratum. The school and student cooperation rates, after 
replacement, and roh (the intraclass correlation) are contained in Table 1 (based on 
results reported in len Achievement in venteen untries: A Preliminar 


Report, 1988). 


International Assessment of Educational Progress (IAEP) 


IAEP was conducted during the 1987-88 school year. The study was conducted 
by the Educational Testing Service (ETS) and funded by the National Center for 
Education Statistics and the National Science Foundation. Items and methodologies 
were used from the National Assessment of Educational Progress (NAEP). The study 
included five countries and four Canadian provinces. Since sampling was by age and 
not grade, intact classrooms were not sampled. In each participating country 63 
mathematics and 60 science items were administered to 13 year-olds and matrix 


sampling was not used. 


In the U.S. the sampling and data collection activities were carried out by ETS. 
Two booklets, one for math and one for science, were administered along with the 
other spiraled NAEP booklets. The U.S. sample for the IAEP was a complex three-stage 
probability sample. The primary sampling units were metropolitan statistical areas, 
a single county, or groups of contiguous counties. The secondary sampling units 
were schools within PSUs and the the tertiary sampling unit was students within 
schools. PSUs were stratified by region and percent minority. One PSU was selected 
with probability proportional to a measure of size from each of the strata. Schools 
were stratified by public-private and percent minority. For schools, systematic 


sampling was used with probabilities proportional to a measure of size. Within 
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schools a systematic sample of 13 year-old students was obtained. The achieved 


sample sizes, cooperation rates, roh,b, and approximate design effects for the U.S. 


portion of the IAEP are contained in Table 1. 


The effective sample sizes for the IAEP are estimated to be 423 and 430 for math 
and science, respectively. The effective sample sizes meet the IEA standard of at least 
400. Furthermore, the cooperation rates for school and student each exceed the IEA 
standard of 85%. 


Il. MARKER VARIABLES 


Comparison of responses to student and teacher questionnaires of the SIMS to 
other national surveys can help provide guidance on biases that might have resulted 
from the achieved low response rates. This section will examine national surveys of 


student characteristics, teacher characteristics, and student achievement levels. 


A. Student and Teacher Characteristics 


x_of Studen 


From Table 2 we see that the SIMS sample appears to include a slightly higher 
proportion of girls than was reported in the Current Population Survey (CPS) 
suggesting a slight tendency to undercover boys. The difference in the percentage 
of females in SIMS and the CPS is larger than the difference in SIMS and the 1986 
High School _ and Beyond Survey (HSB) of high school sophomores. The HSB and the 
SIMS were both school based surveys while the CPS is a household survey. The lower 
coverage of males in the school based survey suggests that survey practices, or 
weighting methods, may result in different coverage rates in the number of girls 


compared to boys at this grade level. 


Region 


SIMS was more likely to include students living in the Midwest or West regions 
that is found in the population. Consequently, the Northeast and South are under 
represented in the SIMS _ sample. About 10% of the sample would have to be 
redistributed between the regions to achieve the same regional distribution as is 


found in the CPS sample. The HSB distribution among U.S. regions is also different 
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from the CPS sample. About 6% of the HSB population would need to be distributed 
between regions to obtain the same distribution as in the CPS sample. Thus, neither 
SIMS nor HSB appears to have the same distribution of the sample among U.S. regions 
as CPS. 


har risti f h r n in_the IEA nd_International Mathematics 


IEA 8th grade 


mathematics’ students National survey 
CPS students HSB sophomores 


(7 to 13 yrs old) 


Student 
characteristics 1983 1986 


Sex 


Percent male 
Percent female 


Region 


Total 
Northeast 
South 
Midwest 
West 


Race 


Percent Black 10.5 14.1 Lo 
Source: U.S. Bureau of the Census, Current Population Reports, Social and Economic 
Characteristics of Students, 1983, Series P-20, no. 413; Calvin Jones, et al., High School and 
Beyond 1 homores First Follow-up (1982) D Fil rs_ manual, NCES (83-214), April 
1983. The CPS estimates for sex and race are for 8th grade students (Table 15) and the 
estimates for region are for students 7 to 13 years old (Table 9). 


R f_ Studen 


SIMS appears to have a low representation of the Black population. The 
proportion of Blacks in the sample, 10.5%, is below the 14% reported in the Current 
Population Survey. This rate may have been a result of regional response rates or to 


lower participation by Blacks within sampled schools. It is interesting to note that 
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the HSB also reports a smaller proportion of Blacks than found by the Census Bureau 
(suggesting that participation rates within schools may be a factor). But, with 
information available, it is impossible to determine whether these differences are 
due to differences in rates of response to each of the surveys of schools by low 


income populations or to student participation rates within schools. 


har risti Teacher 


Table 3 presents a few comparisons of characteristics of 8th grade mathematics 
teachers in SIMS to national representative samples of all elementary and secondary 
school teachers in the United States. It shows that the age, work experience, and sex 
ratios of teachers in SIMS are very close to that found in a national survey of 
elementary and secondary school teachers. The sex ratio cannot be compared exactly 
to existing surveys because it is not possible to separately identify 8th grade 
mathematics teachers in the national sample. However, the ratio of women to men in 
secondary schools appears to match that for the 8th grade arithmetic teachers very 


closely. 


TABLE 3 


Teacher har risti in_ the IEA nd_ International Mathemati SIM 
and _ in NCES Surveys of Public and Private hool_ Teachers 


SIMS 
8th grade teachers NCES survey of 
1982 1986 


Characteristic 


Median age (years 


Median years experience 


Percent female 


Elementary 
Secondary 


Source: ack g 2 xperience 3 ate eache 2 
1985-86. The U.S. figures for teachers were published separately for public and private 
schools. An estimate for all teachers was obtained by weighing by the proportion of all teachers 
in public and private schools. Sixteen percent of all teachers were in private schools. These 
data include teachers at any grade and in all classes. Information on average age and experience 
for 8th grade teachers was not available separately from the NCES 1986 survey. 
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Summary 


This comparison of a few background characteristics of students and teachers 
with results from other national studies shows that the students in the SIMS sample 
may be slightly more likely to be female, residing in the Midwest and West, and less 
likely to be Black than the total population. The comparisons of age, experience, and 
sex of teachers with characteristics of teachers from similar surveys found no 


evidence that the SIMS sample is unrepresentative of U.S. mathematics teachers. 


B. Comparisons of Student Achievement 


The purpose of SIMS was to obtain national estimates of student cognitive 
achievement in mathematics. is) iuclcmauy ao cViucnice irOur  Olnecr SUIVCYSr OF 
mathematics that the resulting U.S. sample contains a biased estimate of the cognitive 
achievement of U.S. students? Comparisons of the results of SIMS can be made with 
FIMS, two recent state samples which used the identical SIMS test items, and a recent 


national sample of 13 year olds that used test items from the SIMS. 


Fir International Mathemati 


Some of the test items were identical in the first and second IEA study of 
mathematics permitting a comparison of test scores in the United States between 1965 
and 1982. The difference in percentage correct for the 36 identical test items in the 
two studies was small and less then could be attributed to sampling error (Table 4). 
This surprising consistency in test scores reflects the lack of change noted for 13 
year olds reported by the National Assessment of Education Progress in the United 
States between 1973 and 1986 (Arthur Applebee, Judith Langer, and Ina Mullis, 
Crossroads_in American Education: A Summary of Findings, Princeton, NJ, Education 
Testing Service, February, 1989, pg. 9). Thus, the consistency between the results of 
the two international studies in the U.S. lends confidence that SIMS produced an 


appropriate estimate of the student achievement levels in mathematics in 1982. 
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TABLE 4 


Eighth Gra Achievemen Topi n mmon Items from the Fir n ond IEA 
Mathemati ies: 1] nd_ 1982 


Percent correct 
Number of 
items 1965 1982 Difference 
Total 


Arithmetic 


Algebra 
Geometry 
Statistics 
Measurement 


Source: 
NCES, Contractor's Report. Ma 


State Replications 


Another source of independently derived student achievement data that is 
useful for determining the validity of the SIMS is state representative surveys. 
Achievement levels of students in each of the states do not differ greatly from 
national samples and thus we might safely make inferences of similarity between the 
national SIMS and a similar test conducted in a particular state. Evidence that the 
national test scores are similar to state for scores for achievement tests is shown in 
state comparisons of SAT scores. For all states in which 50% of high school seniors 
are within a 15 point range (see Digest, Table 92 and Table 5). Therefore, we would 
not expect the national estimates from SIMS to differ greatly from these state 


estimates. 


As a further examination of the differences between states, state scores for 
student achievement for eight southern states on NAEP tests for mathematics and the 
average SAT scores for the state are shown in Table 5. The Southern Regional 
Educational Board/National Assessment of Educational Progress (SREB) tests were 
administered to state samples of 11th grade public school students. The SREB test 
scores for Florida and Virginia are very close to the national average. Also, the ACT 
test scores in Florida and Virginia are close to the national average. The states of 


Florida and Virginia replicated the achievement tests of SIMS in their states. 
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Comparisons made with the SREB test and SIMS should show only small, or no, 


detectable differences. 


It is interesting to note that the SAT scores for four of the states shown in 
Table 5 are much higher than the national average. The student participation rate 
on the ACT tests in these states is very small, less than 10%. Apparently, in states 
with low student participation in the ACT, average state results provide a highly 


upward bias in the level of student achievement for that state. 


The test scores shown in Table 5 for Florida and Virginia give us confidence 
that any comparison of the SIMS national results with samples of Florida and 
Virginia would show only small, or no, detectable differences from an _ unbiased 
national average. A large difference in test scores between scores for each state and 
that for a national sample could lead us to question the accuracy of one of the 


samples. 


Two states, Florida and Virginia, administered the SIMS test items to samples of 
8th grade students in 1988. The mean test scores for five mathematics subtests for the 
8th grade in Florida and Virginia are shown in Table 6. The similarity of the test 
results for the two states and the national averages for 1982 is striking. The greatest 
difference is four percentage points and for most tests it is within 2 percentage 
points. These differences are much less than the observed between countries. Thus, 
these independent state studies provide further evidence that the test scores from the 


SIMS are representative of U.S. levels of achievement. 
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TABLE 5 
Aver Mathemati r for 11th r Publi hool n in _Ejigh ate 
Participatin in th 1 hern Regional E ion Board REB National 
A ment Proj nd_in the 1 holastic Apti T 


SREB 
mathematics Standard % 
score error 


Nation 289.0 


foe) 


Arkansas PREY 
Florida 294.3 
Louisiana 283-1 
North Carolina 288.0 
South Carolina 285.9 
Tennessee 286.6 
Virginia Poa 
West Virginia 283.6 


38 
a 
47 
49 
8 
a4 
7 


=-=- OK OK KS = © S 
Se OROOK NO 


Source: A report of the Southern Regional Educational Board/National Assessment of 
Educational Progress 1987 Program with Arkansas, Florida, Louisiana, North Carolina, South 
Carolina, Tennessee, Virginia, and West Virginia, Measurin tudent_ Achievement: Comparable 
Test Results for Participating SREB States and the Nation. Atlanta, GE, 1987, Appendix B. The 
tests were set with a mean of 289 with a standard deviation of 40. Snyder, Thomas, Digest of 
Education Statistics 1988, National Center for Education Statistics, Washington, DC, 1988; Table 
2. 
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TABLE 6 
Percent of Item rr for 8th Gr mpl in Tw nd_in the IEA Secon 
International Mathematics Study (SIMS) 


Florida Virginia 
Number 
of items 


Total 


Arithmetic 
Algebra 
Geometry 
Measurement 
Statistics 


Source: Summary Report for the United States, Second International Mathematics Study, NCES, 
Contractor's Report. May, 1985. Edwin McClintock, Mathematics Achievement in Florida's 


Middle/Junior High Schools and Advanced Classes in Senior High Schools, Florida Department of 


Education, no date. Edgar L. Edwards, Jr., Replicating the Second International Mathematics 


Study_in Virginia: A Progress Report. Paper presented at the Annual Meeting of the American 
Education Research Association, New Orleans, April, 1988. 


ETS INTERNATIONAL ACHIEVEMENT SCALES 


Another international study of mathematics carried out in 1988 provides an 
Opportunity to compare the achievement score rankings on an international test for 
four countries. The ETS conducted an international study with six countries and four 
Canadian provinces. Curriculum specialists in each country chose test items in 
mathematics and science from the U.S. National Assessment of Education Progress 
(NAEP) item pool. The resulting tests were administered in the spring of 1988 to a 
sample of 13 year olds attending school in each participating country. This section 
will compare results of the test scales for countries and provinces that participated 
in SIMS and IAEP The next section will examine differences in response to five test 


items that were included in the United States data collection for both studies. 


The sample selection method for the ETS study in the United States was carried 
Out as part of the main sample of 13 year olds of the 1988 NAEP. The population was 
defined as an age sample of 13 year olds, one-third was in the 7th grade and two- 
thirds.were in the 8th grade. This differs from SIMS which was defined as a sample 


of 8th grade classrooms. 
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The results for 13-year-old students in the United States, United Kingdom, 
British Columbia, and Ontario, Canada are shown in Table 7 along with test scores for 
SIMS 8th grade students. Results are shown for each of the five sub-scales of the 
mathematics test: arithmetic, algebra, geometry, measurement, and _ Statistics. The 
scales for the two studies were not forced to be the same. One is the average percent 
correct of 180 items and the other is the average percent correct of 54 different 
items; therefore, the values cannot be compared directly. However, the rank order of 
the countries on each of the mathematics sub-scales should indicate whether the two 
surveys can provide the same basic results on _ international comparisons of 


achievement levels. 


The test scores from each study are shown in Table 7 for the United States, 
Great Britain, British Columbia, and Ontario, Canada. The rank order of the four 
educational systems for each mathematics subtest is shown in the second half of that 
table. The first row of the scores and of the rank scores is a total score on that test. 
The rank order of the total score shows that British Columbia was ranked first and the 


United States ranked fourth in both studies. 


The rank order of countries on some subtests is not the same for as for the total 
score, probably because the number of items for the subtests is small. In arithmetic, 
for example, British Columbia was ranked first in both the IEA and the IAEP studies 
while the United States ranked third in the IEA and fourth in the IAEP studies in 
arithmetic. The United States ranked fourth on all five IAEP subtests and on only two 
SIMS subtests. Thus, if we had examined the rank ordering of subtests on SIMS only, 
we would have concluded that the U.S. students had relatively higher achievement 


then would be concluded from the IAEP study. 
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TABLE 7 
Aver Percen Ir for the 1982 IEA nd_ International Mathematics Stud 
IM f h r den n he |] Education Testin ervice International 
A ment of E ion Progr IAEP) for nts 13 Year Id 


Canada 
British United 
Columbia Ontario 9 States 
IAEP SIMS 
Mean % of 
items correct 


Total 52.0 | 64.5 


Arithmetic 
Algebra 
Geometry 
Measurement 
Statistics 


Countr rank 
Total 


Arithmetic 
Algebra 
Geometr 
Measurement 
Statistics 


TABLE 8 


Number of Test Items for Each Mathematics Subtest in the 1982 IEA Second 
International Mathematics Stud IMS) and _ in the 19 International Assessment _of 
Education Progr IAEP 


Subtest IAEP SIMS 
Total 180 


Arithmetic 62 
Algebra 32 
Geometry 42 
Measurement 26 
Statistics 18 


lIncludes 8 items identified as "logic and 
problem solving" in the IAEP 
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In order to allow a comparison of identical items under the testing conditions 
of the IAEP, the Education Testing Service included six SIMS test items in the IAEP 
study. The items were added as pairs at the end of three sections of the IAEP booklets. 
The percent of students who answered the items correctly in the IAEP was probably 
affected by placement of the items at the end of the test section since not all students 
completed each test. While the percentage of students who omitted an item in the 
SIMS was only 2.5%, the percent of students in the IAEP who did not reach the item 
ranged from 10 to 14%. Thus, the percent of all students taking the test who chose 
the correct answer is likely to be significantly lower in the IAEP than in the SIMS. 


The distribution of responses, including distractors, for five items is shown in 
Table 9. The correct answer is underlined. One of the SIMS items included by ETS was 


misprinted and is not shown here. 


The percentages correct are significantly lower in the IAEP study than in the 
IEA SIMS study. The differences between the two studies range from 7 percentage 
points to 17 percentage points. By omitting the non-responses and recalculating P 
values for both studies, the effect of the high non-response in IAEP is reduced, but 
not eliminated. Differences in P values range from 6 to 12 percentage points (see 
Table 9). 


Interestingly, the percentage distribution of responses to the distractors was 
about the same in each study. While that fact lends some credibility to the reliability 
of the sample selection and test administration for the SIMS, it is not sufficient 


evidence that the two samples are identical. 


Thus, the results of the trial of IEA items in a new carefully drawn national 
sample does not lend support to the hypothesis that the SIMS sample is an unbiased 
sample of U.S. 8th grade students. The lower values of correct response found in the 
IAEP study suggest that the SIMS could have been over-estimating the knowledge of 
American students compared with other countries. However, the comparisons of 
SIMS with IAEP do not lend to strong conclusions about the reliability of SIMS 
because the placement of the items on the ETS test instrument caused a dramatic 
difference in response rates to the items and altered the percentage of correct 


answers. Also, purists could argue that the test would be expected to give different 
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rates because its sample included an entire age -group, rather than a grade sample, or 


because the test was conducted in a different year (1988 compared to 1982). Given 


these large differences, the similarities between the two studies is even more 
remarkable. 

TABLE 9 
Percent of nts R ndin Answer f Five Mathemati Item mmon to the 
1982 IEA nd_ International Mathemati IM f 8th Gra udents and 
he_ 1 E ion Testin tvi International A men f Education Progress 
IAEP) for nts 13 Year l 


(Unweighted, correct answer is_ underlined) 


Item number 


IEAS AEP: IEA 


100.0 100.0 100.0 


tal 
11.4 
0.2 
28.6 


7.0 
14.0 
yes 
28.1 


os 
re 
60.3 
6.3 
of7 


IAEP 


100.0 


ee 
8.9 
43.0 
8.4 
8.9 


45 


IEA aelAER 


100.0 100.0 


Deer 
BOS 
py. Fe} 

8.6 

4.5 


6.2 
Plead 
48.5 

1.3 


3.4 


IEA 


100.0 


5.8 
14.1 
oA 

8.4 


IABP: @aulBAr LLAEP 


100.0 100.0 100.0 


12.6 
1S=5 
47.0 
10.2 


16.9 
14.6 
28.0 
20,3 

6.7 


18.1 
1375 


35 


31.4 


Na includes omit and did not reach the item. 
The P-value is calculated as the percent correct, adjusted for item non-response. 


Source: Unpublished tabulations from the Public Use Data Tape of the IAEP 1988, and Li-Chu 
Chang and Judith Ruzicka, Second International Study of Mathematics, Technical Report I: Item 


Level Achievement D for Eighth 
national Center for Education Statistics, Ma 


Twelfth 
1985. 


, Contractor's report to the 
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SUMMARY AND CONCLUSION 


The purpose of this paper was to review the adequacy of the sampling 
procedures carried out for the IEA Second International Mathematics Study (SIMS) in 
the United States in 1982. We reviewed sampling procedures in previous IEA surveys 
and that of a recent ETS international survey in the U.S. and compared "marker 
variables" in the SIMS to other national surveys for both non-cognitive and 
cognitive items. There has been no replication of the entire survey, but aspects of 


the survey have been carried out in other national samples. 


Insufficient survey evidence was located for 12th grade mathematics students 
to perform a thorough evaluation of the 12th grade United States SIMS sample. Thus, 


this paper has been limited to a discussion of the U.S. 8th grade sample. 


We found evidence that the response rates to the U.S. SIMS was lower than 
would be expected for a national survey that would be used to draw important policy 
conclusions. The SIMS sample was found to include more females, more students from 
the Midwest and South, and fewer Blacks than did other national samples. The 
comparison of teacher characteristics was not found to differ from other national 
surveys. The achievement levels for five mathematics topics (arithmetic, algebra, 
geometry, measurement, and statistics) for 8th grade students was very close to the 
first mathematics study and not significantly different from two recent state samples. 
A recent national survey of mathematics carried out by ETS for the International 
Assessment of Educational Progress (IAEP) found that the achievement scores in the 
SIMS were somewhat higher than in the 1988 survey. The slightly higher scores of 
SIMS was found by comparing the rank order of four countries (or provinces) that 
participated in each international study and also by comparing responses to five test 


items identical to each study. 


In summary, the paper found no evidence that the results of the SIMS would 
lead to grossly misleading interpretations about the status of U.S. achievement of 8th 
grade students compared with other countries. There was some evidence that the 
achievement scores of the national survey could be biased slightly upward, but that 


finding was not supported by all available evidence. 
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